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TITLE OF THE INVENTION 

ADAPTIVE BEAMFORMING METHOD AND APPARATUS USING FEEDBACK STRUCTURE 
CROSS-REFERENCE TO RELATED APPLICATIONS 

[0001] This application claims the priority of Korean Patent Application No. 2003-3258, filed 
on January 17, 2003, in the Korean Intellectual Property Office, the disclosure of which is 
incorporated herein in its entirety by reference. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

[0002] The present invention relates to an adaptive beamformer, and more particularly, to a 
method and apparatus for adaptive beamforming using a feedback structure. 

2. Description of the Related Art 

[0003] Mobile robots have applications in health-related fields, security, home networking, 
entertainment, and so forth, and are the focus of increasing interest. Interaction between 
people and mobile robots is necessary when operating the mobile robots. Like people, a mobile 
robot with a vision system has to recognize people and surroundings, find the position of a 
person talking in the vicinity of the mobile robot, and understand what the person is saying. 

[0004] A voice input system of the mobile robot is indispensable for interaction between man 
and robot and is an important factor affecting autonomous mobility. Important factors affecting 
the voice input system of a mobile robot in an indoor environment are noise, reverberation, and 
distance. There are a variety of noise sources and reverberation due to walls or other objects in 
the indoor environment. Low frequency components of a voice are more attenuated than high 
frequency components with respect to distance. Accordingly, for proper interaction between a 
person and an autonomous mobile robot within a house, a voice input system has to enable the 
robot to recognize the person's voice at a distance of several meters. 
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[0005] Such a voice input system generally uses a microphone array comprising at least two 
microphones to improve voice detection and recognition. In order to remove noise components 
contained in a speech signal input via the microphone array, a single channel speech 
enhancement method, an adaptive acoustic noise canceling method, a blind signal separation 
method, and a generalized sidelobe canceling method are employed. 

[0006] The single channel speech enhancement method, disclosed in "Spectral 
Enhancement Based on Global Soft Decision" (IEEE Signal Processing Letters, Vol. 7, No. 5, 
pp. 108-110, 2000) by Nam-Soo Kim and Joon-Hyuk Chang, uses one microphone and 
ensures high performance only when statistical characteristics of noise do not vary with time, 
like stationary background noise. The adaptive acoustic noise canceling method, disclosed in 
"Adaptive Noise Canceling: Principles and Applications" (Proceedings of IEEE, Vol. 63, No. 12, 
pp. 1692-1716, 1975) by B. Widrow et al., uses two microphones. Here, one of the two 
microphones is a reference microphone for receiving only noise. Thus, if only noise cannot be 
received or noise received by the reference microphone contains other noise components, the 
performance of the adaptive acoustic noise canceling method sharply drops. Also, the blind 
signal separation method is difficult to use in the actual environment and to implement real-time 
systems. 

• [0007] FIG. 1 is a block diagram of a conventional adaptive beamformer using the 
- generalized sidelobe canceling method. The conventional adaptive beamformer includes a 
fixed beamformer (FBF) 11, an adaptive blocking matrix (ABM) 13, and an adaptive multi-input 
canceller (AMC) 15. The generalized sidelobe canceling method is described in more detail in 
"A Robust Adaptive Beamformer For Microphone Arrays With A Blocking Matrix Using 
Constrained Adaptive Filters" (IEEE Trans. Signal Processing, Vol. 47, No. 10, pp. 2677-2684, 
1999) by O. Hoshuyama et al. 

[0008] Referring to FIG. 1, the FBF 11 uses a delay-and-sum beamformer. In other words, 
the FBF 11 obtains the correlation of signals, x m (k), where m is an integer between 1 and M, 
input via microphones and calculates time delays among signals input via the microphones. 
Thereafter, the FBF 11 compensates for signals input via the microphones by the calculated 
time delays, and then adds the signals in order to output a signal b(k) having an improved 
signal-to-noise ratio (SNR). The ABM 13 subtracts the signal b(k) output from the FBF 11 
through adaptive blocking filters (ABFs) from each of the signals whose time delays are 
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compensated for in order to maximize noise components. The AMC 15 filters signals z m (k), 
where m is an integer between 1 and M, output from the ABM 13 through adaptive canceling 
filters (ACFs), and then adds the filtered signals, thereby generating noise components via M 
microphones. Thereafter, a signal output from the AMC 15 is subtracted from the signal b(k), 
which is delayed for a predetermined period of time D, to obtain a signal y(k) in which noise 
components are cancelled. 

[0009] The operations of the ABM 1 3 and the AMC 1 5 shown in FIG. 1 will be described in 
more detail with reference to FIG. 2. The operations of the ABM 13 and the AMC 15 are the 
same as in the adaptive acoustic noise canceling method. 

[0010] Referring to FIG. 2, the size of symbols S+N, S, and N denotes the relative magnitude 
of speech and noise signals in specific locations, and left symbols and right symbols separated 
by a slash V denote 'to-be' and 'as-is' states, respectively. 

[0011] An ABF 21 adaptively filters the signal b(k) output from the FBF 11 according to the 
signal output from a first subtractor 23 so that a characteristic of speech components of the 
filtered signal output from the ABF 21 is the same as that of speech components of a 
microphone signal x' m (k) that is delayed for a predetermined period of time. The first subtractor 
23 subtracts the signal output from the ABF 21 from the microphone signal x' m (k), where m is 
an integer between 1 and M, to obtain and output a signal z m (k) which is generated by canceling 
speech components S from the microphone signal x' m (k). 

[0012] An ACF 25 adaptively filters the signal z m (k) output from the first subtractor 23 
according to the signal output from a second subtractor 27 so that a characteristic of noise 
components of the filtered signal output from the ACF 25 is the same as that of noise 
components of the signal b(k). The second subtractor 27 subtracts the signal outputs from the 
ACF 25 from the signal b(k) and outputs a signal y(k) which is generated by canceling noise 
components N from the signal b(k). 

[0013] However, the above-described generalized sidelobe canceling method has the 
following drawbacks. The delay-and-sum beamformer of the FBF 11 has to generate the signal 
b(k) with a very high SNR so that only pure noise signals are input to the AMC 1 5. However, 
because the delay-and-sum beamformer outputs a signal whose SNR is not very high, the 
overall performance drops. As a result, since the ABM 13 outputs a noise signal containing a 
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speech signal, the AMC 15, using the output of the ABM 13, regards speech components 
contained in the signal output from the ABM 13 as noise and cancels the noise. Therefore, the 
adaptive beamformer finally outputs a speech signal containing noise components. Also, 
because filters used in the generalized sidelobe canceling method have a feedforward 
connection structure, finite impulse response (FIR) filters are employed. When such FIR filters 
are used in the feedforward connection structure, 1000 or more filter taps are needed in a room 
reverberation environment. In addition, in a case where the ABF 21 and the ACF 25 are not 
properly trained, the performance of the adaptive beamformer may deteriorate. Thus, speech 
presence intervals and speech absence intervals are necessary for training the ABF 21 and the 
ACF 25. However, these training intervals are generally unavailable in practice. Moreover, 
because adaptation of the ABM 13 and the AMC 15 has to be alternately performed, a voice 
activity detector (VAD) is needed. In other words, for adaptation of the ABF 21 , a speech 
component is a desired signal and a noise component is an undesired signal. On the contrary, 
for adaptation of the ACF 25, a noise component is a desired signal and a speech component is 
an undesired signal. 

SUMMARY OF THE INVENTION 

[0014] The present invention provides a method of adaptive beamforming using a feedback 
structure capable of almost completely canceling noise components contained in a wideband 
speech signal input from a microphone array comprising at least two microphones. 

[0015] The present invention also provides an adaptive beamforming apparatus including a 
feedback structure to cancel noise components contained in wideband speech signals input 
from a microphone array. 

[0016] Additional aspects and/or advantages of the invention will be set forth in part in the 
description which follows and, in part, will be obvious from the description, or may be learned by 
practice of the invention. 

[0017] According to an aspect of the present invention, there is provided an adaptive 
beamforming method including compensating for time delays of M noise-containing speech 
signals input via a microphone array having M microphones (M is an integer greater than or 
equal to 2), and generating a sum signal of the M compensated noise-containing speech 
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signals; and extracting pure noise components from the M compensated noise-containing 
speech signals using M adaptive blocking filters that are connected to M adaptive canceling 
filters in a feedback structure and extracting pure speech components from the sum signal using 
the M adaptive canceling filters that are connected to the M adaptive blocking filters in the 
feedback structure. 

[0018] According to another aspect of the present invention, there is also provided an 
adaptive beamforming apparatus including: a fixed beamformer that compensates for time 
delays of M noise-containing speech signals input via a microphone array having M 
microphones (M is an integer greater than or equal to 2), and generates a sum signal of the M 
compensated noise-containing speech signals; and a multi-channel signal separator that 
extracts pure noise components from the M compensated noise-containing speech signals 
using M adaptive blocking filters that are connected to M adaptive canceling filters in a feedback 
structure and extracts pure speech components from the added signal using the M adaptive 
canceling filters that are connected to the M adaptive blocking filters in the feedback structure. 

[0019] In an aspect of the present invention, the multi-channel signal separator includes a 
first filter that filters a noise-removed sum signal through the M adaptive blocking filters; a first 
subtractor that subtracts signals output from the M adaptive blocking filters from the M 
compensated noise-containing speech signals using M subtractors; a second filter that filters M 
subtraction results of the first subtractor through the M adaptive canceling filters; a second 
subtractor that subtracts signals output from the M adaptive canceling filters from the sum signal 
using M subtractors, and inputs M subtraction results to the M adaptive blocking filters as the 
noise-removed sum signal; and a second adder that adds signals output from the M subtractors 
of the second subtractor. 

[0020] In an aspect of the present invention, the multi-channel signal separator includes a 
first filter that filters a noise-removed sum signal through the M adaptive blocking filters; a first 
subtractor that subtracts signals output from the M adaptive blocking filters from the M 
compensated noise-containing speech signals using M subtractors; a second filter that filters 
signals output from the M subtractors of the first subtractor through the M adaptive canceling 
filters; a second adder that adds signals output from M adaptive canceling filters of the second 
filter; and a second subtractor that subtracts signals output from the second adder from the 
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signals output from the fixed beamformer and inputs M subtraction results to the M adaptive 
blocking filters as the noise-removed sum signal. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0021] These and/or other aspects and advantages of the invention will become apparent 
and more readily appreciated from the following description of the embodiments, taken in 
conjunction with the accompanying drawings of which: 

FIG. 1 is a block diagram of a conventional adaptive beamformer; 

FIG. 2 is a circuit diagram for explaining a feed-forward structure used in the 
conventional adaptive beamformer shown in FIG. 1 ; 

FIG. 3 is a circuit diagram explaining a feedback structure according to an embodiment 
of the present invention; 

FIG. 4 is a block diagram of an adaptive beamformer according to an embodiment of the 
present invention; 

FIG. 5 is a block diagram of an adaptive beamformer according to another embodiment 
of the present invention; and 

FIG. 6 illustrates an experimental environment used to compare an adaptive 
beamformer according to the present invention and the conventional adaptive beamformer 
shown in FIG. 1. 

DETAILED DESCRIPTION OF THE EMBODIMENTS 

[0022] Reference will now be made in detail to the embodiments of the present invention, 
examples of which are illustrated in the accompanying drawings, wherein like reference 
numerals refer to the like elements throughout. The embodiments are described below to 
explain the present invention by referring to the figures. 

[0023] Hereinafter, embodiments of the present invention will be described in detail with 
reference to the attached drawings. Meanwhile, "speech" used hereinafter is a representation 
implicitly including any target signal necessary for using the present invention. 

[0024] FIG. 3 is a circuit diagram for explaining a feedback structure according to an 
embodiment of the present invention. The feedback structure includes an adaptive blocking 



6 



■I <* 

Docket No.: 1793.1172 

filter (ABF) 31 , a first subtractor 33, an adaptive canceling filter (ACF) 35, and a second 
subtractor 37. 

[0025] Referring to FIG. 3, the ABF 31 adaptively filters a signal y(k) output from the second 
subtractor 37 according to a signal output from the first subtractor 33 so that a characteristic of 
speech components of the filtered signal output from the ABF 31 is the same as that of speech 
components of a microphone signal x' m (k), where m is an integer between 1 and M, that is 
delayed for a predetermined period of time. A first subtractor 33 subtracts a signal output from 
the ABF 31 from a signal x m (k-D m ), i.e. x' m (k) obtained by delaying a signal x m (k) input to an 171 th 
microphone among M microphones, where M is an integer greater than or equal to 2, for a 
predetermined period of time D m . As a result, the first subtractor 33 outputs only a pure noise 
signal N contained in the signal x m (k). 

[0026] The ACF 35 adaptively filters a signal z m (k) output from the first subtractor 33 
according to a signal output from the second subtractor 37 so that a characteristic of noise 
components of the filtered signal output from the ACF 35 is the same as that of noise 
components of the signal b(k) output from FBF 11 shown in FIG. 1. The second subtractor 37 
subtracts the signal output from the ACF 35 from the signal b(k). Thus, the second subtractor 
37 outputs only a pure speech signal S derived from the signal b(k) in which noise components 
are cancelled. 

[0027] FIG. 4 is a block diagram of an adaptive beamformer according to an embodiment of 
the present invention. The adaptive beamformer includes a fixed beamformer (FBF) 410 and a 
multi-channel signal separator 430. The FBF 410 includes a microphone array 411 having M 
microphones 411a, 411b, and 411c, a time delay estimator 413, a delayer 415 having M delay 
devices 415a, 415b and 415c, and a first adder 417. The multi-channel signal separator 430 
includes a first filter 431 having M ABFs 431a and 431b, a first subtractor 433 having M 
subtracters 433a and 433b, a second filter 435 having M ACFs 435a and 435b, a second 
subtractor 437 having M subtracters 437a and 437b, and a second adder 439. 

[0028] Referring to FIG. 4, in the FBF 410, the microphone array 411 receives speech 
signals x^k), x 2 (k), and x M (k) via the M microphones 411a, 411b and 411c. The time delay 
estimator 413 obtains the correlation of the speech signals x^k), x 2 (k) and x M (k) and calculates 
time delays D 2 , and D M of the speech signals x^k), x 2 (k) and x M (k). The M delay devices 
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415a, 415b and 415c of the delayer 415 respectively delay the speech signals x^k), x 2 (k) and 
x M (k) by the time delays D 2 and D M calculated by the time delay estimator 41 3, and output 
speech signals x/fk), x 2 '(k) and x M '(k). Here, the time delay estimator 413 may calculate time 
delays of speech signals using various methods besides the calculation of the correlation. 

[0029] The first adder 41 7 adds the speech signals '(k), x 2 '(k) and x M '(k) and outputs a 
signal b(k). The signal b(k) output from the first adder 417 can be represented as in Equation 1 . 

= (*). m=l, ,M ...(1) 

[0030] In the multi-channel signal separator 430, the M ABFs 431a and 431b adaptively filter 
signals output from the M subtractors 437a and 437b of the second subtractor 437 according to 
signals output from the M subtractors 433a and 433b of the first subtractor 433, so that a 
characteristic of speech components of the filtered signals output from the M ABFs 431a and 
431b is the same as that of speech components of a microphone signal x' m (k), that is delayed 
for a predetermined period of time. 

[0031] The M subtractors 433a and 433b of the first subtractor 433 respectively subtract the 
signals output from the M ABFs 431a and 431b from the speech signals xr(k) and x M '(k), and 
respectively output signals u n (k) and u M (k) to the M ACFs 435a and 435b. When a coefficient 
vector of the m th ABF of the first filter 431 is h T m (k) and the number of taps is L, the signal u m (k) 
output from the subtractors 433a and 433b of the first subtractor 433 can be represented as in 
Equation 2. 

u m (k) = x' m (k)-h T m (k)vr m (k) ...(2) 
wherein, h T m (k) and w m (k) can be represented as in Equations 3 and 4, respectively. 

hJ^) = [^ 1 W,^ 2 W,..,A ffl , i W] r ...(3) 
wherein, h m l (k) is an ^ th coefficient of h m (k). 

W» (*) =k(*- 1), w m (k - 2),. . w m (k - L)f ... (4) 

wherein, w m (k) denotes a vector collecting L past values of w m (k), L denotes the number of filter 
taps of the M ABFs 431a and 431b. 
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[0032] The M ACFs 435a and 435b of the second filter 435 adaptively filter the signals Ui(k) 
and u M (k) output from the M subtracters 433a and 433b of the first subtracter 433 according to 
signals output from the M subtracters 437a and 437b of the second subtracter 437, so that a 
characteristic of noise components of the filtered signals output from the M ACFs 435a and 
435b is the same as that of noise components of the signal b(k) output from the FBF 410. 

[0033] The M subtracters 437a and 437b of the second subtracter 437 respectively subtract 
the signals output from the M ACFs 435a and 435b of the second filter 435 from the signal b(k) 
output from the FBF 410, and output w t (k) and w M (k) to the second adder 439. When a 
coefficient vector of the m th ACF of the second filter 435 is g m (k) and the number of taps is N, 
the signal w m (k) output from the M subtracters 437a and 437b of the second subtracter 437 can 
be represented as in Equation 5. 

™ m (k) = b(k)-g T m (k)U m (k) ...(5) 
wherein, g T m (k) and u m (k) can be represented as in Equations 6 and 7, respectively. 

gJ^) = [^, 1 W^ w , 2 Wv..,g w . JV W] r -.(6) 
wherein, g m , n (k) denotes an n th coefficient of g m (k). 

U w (*) = K (* - 1), u m (k - 2),..., u m (k - N)f . . .(7) 

wherein, u m (k) denotes a vector collecting N past values of u m (k) and N denotes the number of 
filter taps of the M ACFs 435a and 435b. 

[0034] The second adder 439 adds w^k) and w M (k) output from the M subtracters 437a and 
437b of the second subtracter 437 and outputs a signal y(k) in which noise components are 
cancelled. The signal y(k) output from the second adder 439 can be represented as in Equation 
8. 

y&) = f d w m (k), m = \,..M ...(8) 

m=\ 

[0035] FIG. 5 is a block diagram of an adaptive beamformer according to another 
embodiment of the present invention. Referring to FIG. 5, the adaptive beamformer includes a 
FBF 510 and a multi-channel signal separator 530. The FBF 510 includes a microphone array 
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511 having M microphones 511a, 511b and 511c, a time delay estimator 513, a delayer 515 
having M delay devices 515a, 515b and 515c, and a first adder 517. The multi-channel signal 
separator 530 includes a first filter 531 having M ABFs 531a, 531b, and 531c, and a first 
subtractor 533 having M substractors 533a, 533b and 533c, a second filter 535 having M ACFs 
535a, 535b and 535c, a second adder 537, and a second subtractor 539. Here, the structure 
and operation of the FBF 510 are the same as those of the FBF 410 shown in FIG. 4, and thus 
will not be described herein; only the multi-channel separator 530 will be described. 

[0036] Referring to FIG. 5, in the multi-channel signal separator 530, the M ABFs 531a, 531b 
and 531c of the first filter 531 adaptively filter a signal y(k) output from the second subtractor 
539 according to signals output from the M subtracters 533a, 533b and 533c of the first 
subtractor 533, so that a characteristic of speech components of the filtered signals output from 
the M ABFs 531a, 531b and 531c is the same as that of speech components of a microphone 
signal x' m (k), that is delayed for a predetermined period of time. 

[0037] The M subtracters 533a, 533b and 533c of the first subtractor 533 respectively 
subtract the signals output from ABFs 531a, 531b and 531c from microphone signals x/fk), 
x 2 '(k) and x M '(k) delayed for a predetermined period of time and output signals z^k), z 2 (k) and 
z M (k) to the M ACFs 535a, 535b and 535c of the second filter 535. When a coefficient vector of 
the m th ABF of the first filter 531 is h m (k) and the number of taps is L, the signal z m (k) output 
from the M subtracters 533a, 533b and 533c of the first subtractor 533 can be represented as in 
Equation 9. 

z m (k) = x' m (k)-h T m (k)y(k\ m = l,...,M ...(9) 
wherein, h T m (k) and y(k) can be represented as in Equations 10 and 11, respectively. 

h m (*) = [h mA (ft), h ma (ft),..., h mtL (k)] T ...(10) 
wherein, h m , / (k) denotes an / h coefficient of h m (k). 

y(*) = [y(k - i),y(k - 2),...,j<* - L)f . ..(n ) 

wherein, y(k) denotes a vector collecting L past values of y(k) and L denotes the number of filter 
taps of the M ABFs 531a, 531b and 531c. 
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[0038] The M ACFs 535a, 535b and 535c of the second filter 535 adaptively filter the signals 
Zi(k), z 2 (k) and z M (k) output from the M subtracters 533a, 533b and 533c of the first subtracter 
533 according to a signal output from the second subtracter 539, so that a characteristic of 
noise components of a signal v(k) output from the second adder 537 is the same as that of 
noise components of the signal b(k) output from the FBF 510. 

[0039] The second adder 537 adds the signals output from the M ACFs 535a, 535b and 
535c. When a coefficient of the m th ACF of the second filter 535 is g m (k) and the number of taps 
is N a signal v(k) output from the second adder 537 can be represented as in Equation 12. 

v(k) = f d g T m (k)z m (k), m = l 9 ...,M ...(12) 

m=\ 

wherein, g T m (k) and z m (k) can be represented as in Equations 13 and 14, respectively. 

z m (k)=[g m ^U m ^k),... <gm ^ T -(13) 

wherein, g m , n (k) denotes an n th coefficient of g m (k). 

z m(^) = ^m^- 1 )^ m (^-2),... 5 z m (^-A0f ...(14) 

wherein, z m (k) denotes a vector collecting N past values of z m (k) and N denotes the number of 
filter taps of the M ACFs 535a, 535b and 535c. 

[0040] The second subtracter 539 subtracts the signal v(k) output from the second adder 537 
from the signal b(k) output from the FBF 510 and outputs the signal y(k). The signal y(k) output 
from the second subtracter 539 can be represented as in Equation 15. 

y(k) = b(k)-v(k) ...(15) 

[0041] In the above-described embodiments, the M ABFs 431a and 431b of the first filter 
431 , the M ABFs 531a, 531b and 531c of the first filter 531 , M ACFs 435a and 435b of the 
second filter 435, and the M ACFs 535a, 535b and 535c of the second filter 535 illustrated in 
FIGS. 4 and 5 respectively, may be FIR filters. In view of inputs and outputs, each of the filters 
is an FIR filter. However, the multi-channel signal separators 430 and 530 may be regarded as 
infinite impulse response (MR) filters in view of inputs, i.e., the signal b(k) output from the FBFs 
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410 and 510 and the microphone signals xr(k), x 2 '(k) and x M '(k) delayed for a predetermined 
period of time, and outputs, i.e., the signal y(k) output from the second adder 439 shown in FIG. 
4 and the second subtractor 539 shown in FIG. 5. This is because the M ABFs 431a and 431b 
and the M ABFs 531a, 531b and 531c of the first filters 431 and 531 and the M ACFs 435a and 
435b and the M ACFs 535a, 535b and 535c of the second filters 435 and 535 have a feedback 
connection structure. 

[0042] Coefficients of the FIR filters are updated by the information maximization algorithm 
proposed by Anthony J. Bell. The information maximization algorithm is a statistical learning 
rule well known in the field of independent component analysis, by which non-Gaussian data 
structures of latent sources are found from sensor array observations on the assumption that 
the latent sources are statistically independent. Because the information maximization 
algorithm does not need a voice activity detector (VAD), coefficients of ABFs and ACFs can be 
automatically adapted without knowledge of the desired and undesired signal levels. 

[0043] According to the information maximization algorithm, coefficients of the M ABFs 431a 
and 431b and the M ACFs 435a and 435b are updated as in Equations 16 and 17. 

h mJ (k + 1) = h mJ (*) + aSGN(u m (k))w m (k-l) ...(16) 

gm,n (* + !) = Sm,n (*) + fiSGN(w m (k))u m (*-»).. .(1 7) 

wherein, a and P denote step sizes for learning rules and SGN(») is a sign function which is +1 if 
an input is greater than zero and -1 if the input is less than zero. 

[0044] According to the information maximization algorithm, coefficients of the M ABFs 531a, 
531b and 531c and the M ACFs 535a, 535b and 535c are updated as in Equations 18 and 19. 

h mJ (k + 1) = h mJ (k) + aSGN(z m (k))y(k - 1) ...(18) 

gm,n (* + 1) = gm 9 n (*) + /®GN (y(k))z m (k - n) .-(19) 

wherein, a and p denote step sizes for learning rules and SGN(») is a sign function which is +1 if 
an input is greater than zero and -1 if the input is less than zero. The sign function SGN(») 
could be replaced by any kind of saturation function, such as a sigmoid function and a tanh(«) 
function. 
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[0045] In addition, coefficients of the M ABFs 431a and 431b, the M ABFs 531a, 531b and 
531 c f M ACFs 435a and 435b, and the M ACFs 535a, 535b and 535c can be updated using any 
kind of statistical learning algorithms such as a least square algorithm and its variant, a 
normalized least square algorithm. 

[0046] As described above, when the M ABFs 431a and 431b and the M ACFs 435a and 
435b, and the M ABFs 531a, 531b and 531c and the M ACFs 535a, 535b and 535c are FIR 
filters and connected in a feedback structure, and the number of microphones of each of the 
microphone arrays 411 and 511 is 8, the number of filter taps of the adaptive beamformer shown 
in FIGS. 4 or 5 is 8x(128+128)=2048, which is much fewer than the number 8x(51*2+128)=5120 
of filter taps of the conventional adaptive beamformer shown in FIG. 1 . 

[0047] FIG. 6 illustrates an experimental environment used for comparing an adaptive 
beamformer according to the present invention and the conventional adaptive beamformer 
shown in FIG. 1. A circular microphone array having a diameter of 30 cm was located in the 
center of a room having a length of 6.5m, a width of 4.1m, and a height of 3.5m. Eight 
microphones were installed on the circular microphone array equidistant from adjacent 
microphones. The heights of the microphone array, a target speaker, and a noise speaker were 
all 0.79m from the floor. Target sources were speech waves of 40 words pronounced by four 
male speakers, and noise sources were a fan and music. 

[0048] The results of an objective evaluation of the performance of the two adaptive 
beamformers in the above-described experimental environment, e.g., a comparison of SNRs, 
are shown in Table 1 (all units are in dBs). 
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[0049] Table 1 





Raw Signal 


Prior Art (GSC) 


Present Invention 


FAN 


9.0 


19.5 


27.5 


MUSIC 


6.9 


15.5 


24.9 


A FAN 


X 


10.5 


18.5 


A MUSIC 


X 


8.6 


18.0 



[0050] As can be seen in Table 1 , the SNR in a beamforming method according to the 
present invention is roughly double the SNR in a beamforming method according to the prior art. 



[0051] For a subjective evaluation in the experimental environment, e.g., an AB preference 
test, after ten people had listened to outputs of a beamformer according to the prior art and a ■ 
beamformer according to the present invention, they were asked to choose one of the following 
sentences for evaluation, which are "A is much better than B", "A is better than B", "A and B are 
the same", "A is worse than B", and "A is much worse than B". A test program randomly 
determined which one of the beamformers according to the prior art and the present invention 
would output signal A. Also, two points were given for "much better", one point for "better", and 
no points for "the same" and then the results were summed. The subjective evaluation 
compared 40 words for fan noise and another 40 words for music noise, and the results of the 
comparison are shown in Table 2. 



[0052] Table 2 





Prior art (GSC) 


Present Invention 


FAN 


78 


517 


MUSIC 


140 


284 



[0053] As can be seen in Table 2, the outputs of the beamformer according to the present 
invention are superior to the outputs of the beamformer according the prior art. 
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[0054] As described above, according to the present invention, by connecting ABFs and 
ACFs in a feedback structure, noise components contained in a wideband speech signal input 
via a microphone array comprising at least two microphones can be nearly completely 
cancelled. Also, while the ABFs and the ACFs have been realized as FIR filters and connected 
in a feedback structure, the ABFs and the ACFs may be regarded as IIR filters, which reduces 
the number of filter taps. In addition, since an information maximization algorithm can be used 
to learn coefficients of the ABFs and the ACFs, the number of parameters necessary for 
learning can be reduced and a VAD for detecting whether speech signals exist is not necessary. 

[0055] Moreover, a method and apparatus adaptively beamforming according to the present 
invention are not greatly affected by the size, arrangement, or structure of a microphone array. 
Also, a method and apparatus adaptively beamforming according to the present invention are 
more robust against look directional errors than the conventional art, regardless of the type of 
noise. 

[0056] The present invention can be realized as a computer-readable code on a computer- 
readable recording medium. Such a computer-readable medium may be any kind of recording 
medium in which computer-readable data is stored. Examples of such computer-readable 
media include ROMs, RAMs, CD-ROMs, magnetic tapes, floppy discs, optical data storing 
devices, and carrier waves (e.g., transmission via the Internet), and so forth. Also, the 
computer-readable code can be stored on the computer-readable media distributed in 
computers connected via a network. Furthermore, functional programs, codes, and code 
segments for realizing the present invention can be easily analogized by programmers skilled in 
the art. 

[0057] Moreover, a method and apparatus adaptively beamforming according to the present 
invention can be applied to autonomous mobile robots to which microphone arrays are 
attached, and to vocal communication with electronic devices in an environment where a user is 
distant from a microphone. Examples of such electronic devices include personal digital 
assistants (PDA), WebPads, and portable phone terminals in automobiles, having a small 
number of microphones. With the present invention, the performance of a voice recognizer can 
be considerably improved. 
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[0058] Although a few embodiments of the present invention have been shown and 
described, it would be appreciated by those skilled in the art that changes may be made in this 
embodiment without departing from the principles and spirit of the invention, the scope of which 
is defined in the claims and their equivalents. 
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